-
Notifications
You must be signed in to change notification settings - Fork 764
[BACKPORT 25.10] Add spot interruption tracking to trace records (#6606) #6674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: STABLE-25.10.x
Are you sure you want to change the base?
[BACKPORT 25.10] Add spot interruption tracking to trace records (#6606) #6674
Conversation
bf2119b to
db6e9a4
Compare
plugins/nf-amazon/src/test/nextflow/cloud/aws/batch/AwsBatchTaskHandlerTest.groovy
Outdated
Show resolved
Hide resolved
Track and report spot/preemptible instance interruptions for cloud batch executors. Changes: - Add `numSpotInterruptions` transient field to TraceRecord - AWS Batch: detect spot interruptions by checking status reason pattern "Host EC2*" - Google Batch: detect spot preemptions via exit code 50001 in status events - Tower plugin: send numSpotInterruptions to Seqera Platform telemetry This enables workflow optimization and cost analysis by tracking how often tasks are retried due to spot instance reclamation. (cherry picked from commit eecd816) Signed-off-by: Lorenzo Fontana <[email protected]>
db6e9a4 to
b62f0e6
Compare
pditommaso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good, just not merge yet
| def trace = handler.getTraceRecord() | ||
| then: | ||
| 1 * handler.isCompleted() >> false | ||
| 2 * handler.isCompleted() >> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand why this changed to 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because another call is added to isCompleted in the getNumSpotInterruptions method.
Since here we are testing getTraceRecord() method which already have one call to isCompleted()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the resolution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one call here
nextflow/plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy
Line 958 in eecd816
| def result = super.getTraceRecord() |
and another call is added here
nextflow/plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy
Line 961 in eecd816
| result.numSpotInterruptions = getNumSpotInterruptions(jobId) |
Both are in getTraceRecord() method
we can remove one from getNumSpotInterruptions, it a separate method, so if it called somewhere else where isCompleted is not called then that can lead to an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think this check !isCompleted() is not really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/test/nextflow/cloud/aws/batch/AwsBatchTaskHandlerTest.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Munish Chouhan <[email protected]>
plugins/nf-google/src/test/nextflow/cloud/google/batch/GoogleBatchTaskHandlerTest.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Munish Chouhan <[email protected]>
Signed-off-by: Munish Chouhan <[email protected]>
Signed-off-by: Munish Chouhan <[email protected]>
Signed-off-by: Munish Chouhan <[email protected]>
Adds
numSpotInterruptionsfield to trace records for Tower/Platform telemetry:(cherry picked from commit eecd816)